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Abstract 

We show that a set of gates that consists of all one-bit quantum gates (U(2)) 
and the two-bit exclusive-or gate (that maps Boolean values (x, y) to (x,x(By)) 
is universal in the sense that all unitary operations on arbitrarily many bits n 
(U(2 n )) can be expressed as compositions of these gates. We investigate the 
number of the above gates required to implement other gates, such as gener- 
alized Deutsch-Toffoli gates, that apply a specific U(2) transformation to one 
input bit if and only if the logical AND of all remaining input bits is satisfied. 
These gates play a central role in many proposed constructions of quantum com- 
putational networks. We derive upper and lower bounds on the exact number 
of elementary gates required to build up a variety of two- and three-bit quan- 
tum gates, the asymptotic number required for n-bit Deutsch-Toffoli gates, and 
make some observations about the number required for arbitrary n-bit unitary 
operations. 
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1 Background 

It has recently been recognized, after fifty years of using the paradigms of classical 
physics (as embodied in the Turing machine) to build a theory of computation, that 
quantum physics provides another paradigm with clearly different and possibly much 
more powerful features than established computational theory. In quantum compu- 
tation, the state of the computer is described by a state vector which is a complex 
linear superposition of all binary states of the bits x m G {0, 1}: 

x&{0,l} m x 

The state's evolution in the course of time t is described by a unitary operator U on 
this vector space, i.e., a linear transformation which is bijective and length-preserving. 
This unitary evolution on a normalized state vector is known to be the correct physical 
description of an isolated system evolving in time according to the laws of quantum 
mechanics Hfl. 

Historically, the idea that the quantum mechanics of isolated systems should be 
studied as a new formal system for computation arose from the recognition twenty 
years ago that computation could be made reversible within the paradigm of clas- 
sical physics. It is possible to perform any computation in a way that is reversible 
both logically — i.e., the computation is a sequence of bijective transformations — and 
thermodynamically — the computation could in principle be performed by a physi- 
cal apparatus dissipating arbitrarily little energy 0. A formalism for constructing 
reversible Turing machines and reversible gate arrays (i.e., reversible combinational 
logic) was developed. Fredkin and ToffoliQ showed that there exists a 3-bit "univer- 
sal gate" for reversible computation, that is, a gate which, when applied in succession 
to different triplets of bits in a gate array, could be used to simulate any arbitrary 
reversible computation. (Two-bit gates like NAND which are universal for ordinary 
computation are not reversible.) Toffoli's version^ of the universal reversible gate 
will figure prominently in the body of this paper. 
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Quantum physics is also reversible, because the reverse-time evolution specified by 
the unitary operator U^ 1 = always exists; as a consequence, several workers rec- 
ognized that reversible computation could be executed within a quantum-mechanical 
system. Quantum- mechanical Turing machines 0, ||, gate arrays J7|, and cellular 
automata || have been discussed, and physical realizations of Toflbli's|| [10], |Il|] 
and Fredkin'sjT^, |13| , |HH universal three-bit gates within various quantum- mechanical 
physical systems have been proposed. 

While reversible computation is contained within quantum mechanics, it is a small 
subset: the time evolution of a classical reversible computer is described by unitary 
operators whose matrix elements are only zero or one — arbitrary complex numbers 
are not allowed. Unitary time evolution can of course be simulated by a classical 
computer (e.g., an analog optical computer governed by Maxwell's equations) [15|] , but 
the dimension of the unitary operator thus attainable is bounded by the number of 
classical degrees of freedom — i.e., roughly proportional to the size of the apparatus. 
By contrast a quantum computer with m physical bits (see definition of the state 
above) can perform unitary operations in a space of 2 m dimensions, exponentially 
larger than its physical size. 

Deutsch|l6|] introduced a quantum Turing machine intended to generate and op- 
erate on arbitrary superpositions of states, and proposed that, aside from simulating 
the evolution of quantum systems more economically than known classical meth- 
ods, it might also be able to solve certain classical problems — i.e., problems with a 
classical input and output — faster than on any classical Turing machine. In a se- 
ries of artificial settings, with appropriately chosen oracles, quantum computers were 
shown to be qualitatively stronger than classical ones |T7], [Tj| |H| [21J , culminating in 
Shor's [22| discovery of quantum polynomial time algorithms for two important 
natural problems, viz. factoring and discrete logarithm, for which no polynomial-time 
classical algorithm was known. The search for other such problems, and the physi- 
cal question of the feasibility of building a quantum computer, are major topics of 
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investigation today P^]. 

The formalism we use for quantum computation, which we call a quantum "gate 
array" was introduced by Deutsch [p4 l, who showed that a simple generalization of 
the Toffoli gate (the three-bit gate A2(R X ), in the language introduced later in this 
paper) suffices as a universal gate for quantum computing. The quantum gate array is 
the natural quantum generalization of acyclic combinational logic "circuits" studied 
in conventional computational complexity theory. It consists of quantum "gates", 
interconnected without fanout or feedback by quantum "wires" . The gates have the 
same number of inputs as outputs, and a gate of n inputs carries a unitary operation 
of the group U(2 n ), i.e., a generalized rotation in a Hilbert space of dimension 2 n . 
Each wire represents a quantum bit, or qubit f2"5| , |26|j , i.e., a quantum system with a 
2-dimensional Hilbert space, capable of existing in a superposition of Boolean states 
and of being entangled with the states of other qubits. Where there is no danger of 
confusion, we will use the term "bit" in either the classical or quantum sense. Just as 
classical bit strings can represent the discrete states of arbitrary finite dimensionality, 
so a string of n qubits can be used to represent quantum states in any Hilbert space of 
dimensionality up to 2 n . The analysis of quantum Turing machines [^UJ is complicated 
by the fact that not only the data but also the control variables, e.g., head position, 



can exist in a superposition of classical states. Fortunately, Yao has shown [27] that 
acyclic quantum gate arrays can simulate quantum Turing machines. Gate arrays are 
easier to think about, since the control variables, i.e., the wiring diagram itself and 
the number of steps of computation executed so far, can be thought of as classical, 
with only the data in the wires being quantum. 

Here we derive a series of results which provide new tools for the building-up of 
unitary transformations from simple gates. We build on other recent results which 
simplify and extend Deutsch's original discovery ||24|| of a three-bit universal quantum 
logic gate. As a consequence of the greater power of quantum computing as a formal 
system, there are many more choices for the universal gate than in classical reversible 
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computing. In particular, DiVincenzo|[28|| showed that two-bit universal quantum 
gates are also possible; Barenco[^] extended this to show than almost any two-bit 
gate (within a certain restricted class) is universal, and Lloyd |30|] and Deutsch et 
al. ||31|| have shown that in fact almost any two-bit or n-bit (n > 2) gate is also 
universal. A closely related construction for the Fredkin gate has been given fl32f. In 
the present paper we take a somewhat different tack, showing that a non-universal, 
classical two-bit gate, in conjunction with quantum one-bit gates, is also universal; 
we believe that the present work along with the preceding ones cover the full range 
of possible repertoires for quantum gate array construction. 

With our universal-gate repertoire, we also exhibit a number of efficient schemes 
for building up certain classes of n-bit operations with these gates. A variety of 
strategies for constructing gate arrays efficiently will surely be very important for 
understanding the full power of quantum mechanics for computation; construction of 
such efficient schemes have already proved very useful for understanding the scaling of 
Shor's prime factorization p3f. In the present work we in part build upon the strategy 
introduced by Sleator and Weinfurter[||, who exhibited a scheme for obtaining the 
Toffoli gate with a sequence of exactly five two-bit gates. We find that their approach 
can be generalized and extended in a number of ways to obtain more general efficient 
gate constructions. Some of the results presented here have no obvious connection 
with previous gate-assembly schemes. 

We will not touch at all on the great difficulties attendant on the actual phys- 
ical realization of a quantum computer — the problems of error correction ||34|| and 



quantum coherence |35|, |36| are very serious ones. We refer the reader to |37| for a 
comprehensive discussion of these difficulties. 
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2 Introduction 



We begin by introducing some basic ideas and notation. For any unitary 



U 



MOO «01 



and m E {0, 1, 2, . . .}, define the (m + l)-bit (2(' m+1 )-dimensional) operator A m (U) as 



xi, . . . , X m , 

y)) 



Uyolx-t, ...,X m ,0) + U yl \Xi, . . .,x n , 1) if A™ =1 x k = 1 



\x 1 ,...,x m ,y) if A^=i ^fc = 0, 

for all xi, . . . , x m , y G {0, 1}. (In more ordinary language, A/Jli x k denotes the AND of 
the boolean variables {x k }.) Note that A (U) is equated with U. The 2( m+1 ) x 2( m+1 > 
matrix corresponding to A m (U) is 



/ 1 



V 



^00 uoi 
u w u n ) 



(where the basis states are lexicographically ordered, i.e., 1 000) , |001), . . . , 1 1 1 1 ) ) . 
When 

U = 



1 

1 



A m {U) is the so-called Toffoli gate|Q with m + 1 input bits, which maps \x\, . . . , x m , y) 
to \xx, . . . , x m , (AfcLi x k) ® y)- For a general U, A m (U) can be regarded as a general- 
ization of the Toffoli gate, which, on input \xi, . . . , x m , y), applies U to y if and only 

if A2Lia* = i. 

As shown by one of us [[H], |29j, "almost any" single Ai(U) gate is universal in the 
sense that: by successive application of this gate to pairs of bits in an n-bit network, 
any unitary transformation may be approximated with arbitrary accuracy. (It suffices 
for U to be specified by Euler angles which are not a rational multiple of it.) 

We show that in some sense this result can be made even simpler, in that any uni- 
tary transformation in a network can always be constructed out of only the "classical" 
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two-bit gate Aj.( ° J J along with a set of one-bit operations (of the form Aq(U)). 
This is a remarkable result from the perspective of classical reversible computation 
because it is well known that the classical analogue of this assertion — which is that 
all invertible boolean functions can be implemented with Ai( ° J ) and Ao( ° J J 
gates|$8| — is false. In fact, it is well known that only a tiny fraction of Boolean func- 
tions (those which are linear with respect to modulo 2 arithmetic) can be generated 
with these gates [|39|]. 

We will also exhibit a number of explicit constructions of A m {U) using Ai(U), 
which can all be made polynomial in m. It is well knownQ that the analogous family 
of constructions in classical reversible logic which involve building A m ( 1 o ) fr° m 
the three-bit Toffoli gate A2^ ° q ), is also polynomial in m. We will exhibit one 
important difference between the classical and the quantum constructions, however; 
Toffoli showed 0] that the classical A m 's could not be built without the presence of 
some "work bits" to store intermediate results of the calculation. By contrast, we show 
that the quantum logic gates can always be constructed with the use of no workspace 
whatsoever. Similar computations in the classical setting (that use very few or no 



work bits) appeared in the work of Cleve[40[ and Ben-Or and Cleve Plj . Still, the 



presence of a workspace plays an important role in the quantum gate constructions 
- we find that to implement a family of A m gates exactly, the time required for our 
implementation can be reduced from B(m 2 ) to 0(m) merely by the introduction of 
one bit for workspace. 

3 Notation 

We adopt a version of Feynman'sJ?]] notation to denote A m (U) gates and Toffoli gates 
in quantum networks as follows. 
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u 



u 



In all the gate-array diagrams shown in this paper, time proceeds from left to right The 



first network contains a A2(U) gate and the second one contains a 3-bit Toffoli gatef42 



The third and fourth networks contain a Aq(U) and a 2-bit reversible exclusive-or 
(simply called XOR henceforth) gate, respectively. The XOR gate is introduced as 
the "measurement gate" in ||24j| , and will play a very prominent role in many of the 
constructions we describe below. Throughout this paper, when we refer to a basic 
operation, we mean either a A (U) gate or this 2-bit XOR gate. 

In all the gate-array diagrams shown in this paper, we use the usual convention 
that time advances from left to right, so that the left-most gate operates first, etc. 



4 Matrix Properties 



Lemma 4.1: Every unitary 2x2 matrix can be expressed as 



AS 



Jot/2 Q 











-ia/2 



cos 0/2 sin 9/2 
-sin 0/2 cos 9/2 



ip/2 o 







-i/3/2 > 



where 5, a, 9, and ft are real-valued. Moreover, any special unitary 2x2 matrix (i.e. 
with unity determinant) can be expressed as 



Ja/2 q 







-ia/2 



cosfl/2 sinfl/2 \ / e i(3 / 2 
-sin 0/2 cos 6/2 J ' 1 



-i/3/2 



Proof: Since a matrix is unitary if and only if its row vectors and column vectors 
are orthonormal, every 2x2 unitary matrix is of the form 



J(S+a/2+f3/2) 
A(8-a/2+f3/2) 



cos 9/2 
sin 6/2 e 



J(6+a/2-i3/2) 
i(8-a/2-P/2) 



sin 9/2 

cos 9/2 
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where 5, a, 0, and (3 are real-valued. The first factorization above now follows imme- 
diately. In the case of special unitary matrices, the determinant of the first matrix 
must be 1, which implies e lS = ±1, so the first matrix in the product can be absorbed 
into the second one.D 

Definition: In view of the above lemma, we define the following. 

cos 0/2 sin 0/2 



Rv(0) = . J ln J , n (a rotation by around ■01143 

7K ' \ -sin 0/2 cos0/2 I K J yLJ 



• R z (a) = I e -ia/2 ) ( a rotation by a around z). 

( e iS \ 

• Ph(S) = I i5 1 (a phase-shift with respect to 5). 

• a x = ( ^ J ^ (a "negation", or Pauli matrix). 

• / = ( J J J (the identity matrix). 

Lemma 4.2: Tiie following properties hold: 

1. R y (0i) ■ R y (0 2 ) = R y (0! + 2 ) 

2. R z (ai) • R z (a 2 ) = R z («i + a 2 ) 

3. Ph(^) • Ph(5 2 ) = Phfa + 8 2 ) 

4. cr^ • <T X = / 

5. a x ■ R y (0) • a x = R y (-0) 

6. cr x . • R z (a) • a x = R z (-a) 

Lemma 4.3: For any special unitary matrix W (W G SU(2) ), there exist matrices 
A, B, and C G SU(2) such that A ■ B ■ C = I and A ■ o x ■ B ■ o x ■ C = W . 
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Proof: By Lemma 4.1, there exist a, 9, and (3 such that W = R z (a) • Ry(0) • R z (/3). 
Set A = R z (a) ■ R y (|), B = R y (-§) • R z (-^), and C = R z (^). Then 

A-B-C = R z (a).R y (f).R y (-f).R z (-^).R z (^) 
= R z (a) • R z (-a) 
= I, 

and 

= R z (a)-R y (f)-^-R y (4)-R z (-^)-^-R z (^) 

= R z (a) • R y (f ) • a x ■ R y (-f ) • a x ■ a x ■ R z (-^) • a x ■ R z (^) 

= R z (a).R y (|).R y (f).R z (^).R z (^) 

= R z (a) • R y (9) ■ R Z (P) 
= W. 

□ 

5 Two-Bit Networks 

5.1 Simulation of General A\(U) Gates 

Lemma 5.1: For a unitary 2x2 matrix W, a Ai(W) gate can he simulated by a 
network of the form 



W 





A 


( 

-4 


» — 


B 


* 

-4 


» — 


c 

















where A, B, and C E SU(2), if and only if W E SU(2). 

Proof: For the "if" part, let A, B, and C be as in Lemma 4.3. If the value of the 
first (top) bit is then A • B • C — I is applied to the second bit. If the value of the 
first bit is 1 then A ■ cr x ■ B ■ a x ■ C = W is applied to the second bit. 
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For the "only if" part, note that A ■ B ■ C = I must hold if the simulation is 
correct when the first bit is 0. Also, if the network simulates a Ai(W) gate then 
A ■ a x • B • a x • C — W . Therefore, since det(A • a x • B ■ a x • C) — 1, W must also be 
special unitary. □ 

Lemma 5.2: For any 5 and S = Ph(5), a Ai(S) gate can be simulated by a network 
of the form 



E 



S 



where E is unitary. 
Proof: Let 

E = ■ PHI ) = (J e «)- 

Then the observation is that the 4x4 unitary matrix corresponding to each of the 
above networks is 

/ 1 \ 
10 

e iS 
V e iS J 

□ 

Clearly, Ai(S) composed with Ai(W) yields A\(S ■ W). Thus, by noting that any 
unitary matrix U is of the form U = S ■ W, where S = Ph(S) (for some 5) and W is 
G SU(2), we obtain the following. 

Corollary 5.3: For any unitary 2x2 matrix U, a Ai(U) gate can be simulated by 
at most six basic gates: four 1-bit gates (A ), and two XOR gates (Ai(a x ) ). 

5.2 Special Cases 

In Section 5.1, we have established a general simulation of a Ai(U) gate for an ar- 
bitrary unitary U . For special cases of U that may be of interest, a more efficient 
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construction than that of Corollary 5.3 is possible. Clearly, Lemma 5.1 immediately 
yields a more efficient simulation for all special unitary matrices. For example, the 
"x-axis rotation matrix" (to use the language suggested by the mapping between 
SU(2) and SO(3), the group of rigid-body rotations! 



cos 0/2 z sin 0/2 
i sin 0/2 cos 0/2 



Rz(f ) • Ry(#) • Rz 



is special unitary. (R x is of special interest because A 2 (iR x ) is the "Deutsch gate" [24 
which was shown to be universal for quantum logic.) For other specific SU(2) matrices 
an even more efficient simulation is possible. 

Lemma 5.4: A Ai(W) gate can be simulated by a network of the form 



W 





A 


( 

-4 


> — 


B 


( 

-4 


> — 










) — 



where A and B e SU(2) if and only if W is of the form 

W = R z (a) • Ry(9) ■ R z (a) = 
where a and 9 are real-valued. 



e ia cos6/2 sin 6/2 

-sin 9/2 e~ ia cos6/2 



Proof: For the "if" part, consider the simulation of Ai(W) that arises in Lemma 5.1 

when W = R z (a) -R y (0) -R z (a). In this case, A = R z (a)-R y (|), B = R y (-|) -R z (-a) 

and C — I. Thus, B = A' and C can be omitted. 

For the "only if" part, note that B = A^ must hold for the simulation to be valid 

when when the first bit is 0. Therefore, if the first bit is 1 then A-a x -A^- a x is applied 

to the second bit. Now, the matrix A-a x -A^ has determinant —1 and is traceless (since 

its trace is the same as that of a x ). By specializing the characterization of unitary 

matrices in Lemma 4.1 to traceless matrices with determinant —1, we conclude that 

A ■ o x ■ A^ must be of the form 

t _ / sinfl/2 e ia cos#/2 \ 
° x I e- ia cosfl/2 -sin0/2r 
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Therefore, 



t _/ e* cos 0/2 sin 0/2 \ 

1 * I -sin 0/2 cos 0/2 J' 



as required. □ 

Examples of matrices of the form of Lemma 5.4 are R y (0) itself, as well as R z (a) = 
Rz(f ) • R y (0) • R z (f ). However, R x (0) is not of this form. 

Finally, for certain U, we obtain an even greater simplification of the simulation 
of Ai(U) gates. 

Lemma 5.5: A Ai(V) gate can he simulated by a construction of the form 



V 



B 



where A and B are unitary if and only if V is of the form 

v r ( \ r R ^ f sin ^/ 2 e* a cos0/2 \ 

\/ = R z (a)-R y ^-R z (a)-^=^ e _ co ^ /2 _ j, 

where a and 6 1 are real-valued. 

Proof: If an additional Ai(cr x ) is appended to the end of the network in Lemma 5.4 
then, the network is equivalent to that above (since Ai(a x ) is an involution), and also 
simulates a Ai(W ■ a x ) gate (since A\(W) composed with Ai(cr x ) is A\(W ■ a x )).U 

Examples of matrices of the form of Lemma 5.5 are the Pauli matrices 

/ -i N 



and 



°v=[ i o J = R z(f ) • R y (27r) • R z (f ) • a x 



cr z = ( I _J ) = R z (0) • R y (7r) • R z (0) ■ a x 



(as well as a x itself). 

Lemma 5.5 permits an immediate generalization of Corollary 5.3: 
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Corollary 5.6: For any unitary 2x2 matrix U, a Ai(U) gate can be simulated by 
at most six basic gates: four 1-bit gates (A ), and two gates (Ai(V)), where V is of 
the form V = R z (a) ■ R y (9) ■ R z (a) ■ a x . 

A particular feature of the Ai(a z ) gates is that they are symmetric with respect 
to their input bits. In view of this, as well as for future reference, we introduce the 
following special notation for Ai(a z ) gates. 



4 


> 













6 Three-Bit Networks 

6.1 Simulation of General A2(U) Gates 

Lemma 6.1: For any unitary 2x2 matrix U, a A 2 (U) gate can be simulated by a 
network of the form 



U 



V 







V 



where V is unitary. 

Proof: Let V be such that V 2 = U. If the first bit or the second bit are then the 
transformation applied to the third bit is either I or V ■ = I. If the first two bits 
are both 1 then the transformation applied to the third is V • V — U.O 

Some of the intuition behind the construction in the above Lemma is that, when the 
first two input bits are X\ and x 2 , the sequence of operations performed on the third 
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bit is: V iff x\ = 1, V iff x 2 = 1, and iff x\ © x 2 = 1. Since 



x\ + x 2 - (xi © x 2 ) = 2 ■ (xi A a? 2 ) 



(where "+", "— ", and "•" are the ordinary arithmetic operations), the above sequence 
of operations is equivalent to performing V 2 on the third bit iff x\ A x 2 — 1, which is 
the A 2 {V 2 ) gate. (This approach generalizes to produce a simulation of A m (V 2m ), 
for m > 2, which is considered in Section 7.) 

We can now combine Lemma 6.1 with Corollary 5.3 to obtain a simulation of 
A 2 (U) using only basic gates (Ai(a x .) and Ao). The number of these gates is reduced 
when it is recognized that a number of the one-bit gates can be merged and eliminated. 
In particular, the Aq(C) from the end of the simulation of the first Ai(V) gate, and 
the Ao(C^) from the Ai(V') gate combine to form the identity and are eliminated 
entirely. This same sort of merging occurs to eliminate a Ao(^4) gate and a Ao(A^) 
gate. We arrive at the following count: 

Corollary 6.2: For any unitary 2x2 matrix U, a A 2 (£7) gate can be simulated by 
at most sixteen basic gates: eight 1-bit gates (Ao) and eight XOR gates (Ai(a x ) ). 

A noteworthy case is when U — <j x , where we obtain a simulation of the 3-bit Toffoli 
gate A 2 (cr x ), which is the primitive gate for classical reversible logic 0. Later we will 
use the fact that because A 2 (o"a;) is its own inverse, either the simulation of Lemma 

6.1 or the time- reversed simulation (in which the order of the gates is reversed, and 
each unitary operator is replaced by its Hermitian conjugate) may be used. 

6.2 Three-bit gates congruent to f\2{U) 

We now show that more efficient simulations of three-bit gates are possible if phase 
shifts of the quantum states other than zero are permitted. If we define the matrix 



W as 
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then the gates A 2 (W) and A 2 (cr x ) can be regarded as being "congruent modulo phase 
shifts", because the latter gate differs only in that it maps |H1) to — 1 110) (instead 
of 1 110)). This is perfectly acceptable if the gate is part of an operation which merely 
mimics classical reversible computation, or if the gate is paired with another similar 
one to cancel out the extra phase, as is sometimes the case in reversible gate ar- 
rangements (see Corollary 7.4); however, this phase difference is dangerous in general 
if non-classical unitary operations appear in the computation. Gates congruent to 
A2 (<J X ) modulo phase shifts have been previously investigated in [03 . 



The following is a more efficient simulation of a gate congruent to A 2 (<x r ) modulo 
phase shifts: 





< 

* 


> 






A 


-4 


b- 


A 


-4 


b- 




-4 


b- 























where A = R y (^). In the above, the "=" indicates that the networks are not identical, 
but differ at most in the phases of their amplitudes, which are all ±1 (the phase of 
the 1 101) state is reversed in this case). 

An alternative simulation of a gate congruent to A2(<J X ) modulo phase shifts 
(whose phase shifts are identical to the previous one) is given by 



L 

l— I 


a 

1— 1 


L 

— B — i 


3 £t [ 


L 

3 — B — i 


-1 

3 — Bt 



where B = Ry(^). 
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7 n-Bit Networks 



The technique for simulating A 2 {U) gates in Lemma 6.1 generalizes to A m {U) gates 
for m > 2. For example, to simulate a A 3 (U) gate for any unitary U, set V so that 
V 4 = U and then construct a network as follows. 



U 



-©- 



1/ 



-©- 



V 



© — t — © — t — © — t — © — t 



The intuition behind this construction is similar to that behind the construction of 
Lemma 6.1. If the first three input bits are x±, x 2 , and £3 then the sequence of 



operations performed 


on 


the fourth bit is: 


V iff xx = 1 




(100) 


V ] iff xi © x 2 = 


1 


(110) 


V iff x 2 = 1 




(010) 


V f iff x 2 ®x 3 = 


1 


(on) 


V iff xi © x 2 © 




= 1 (111) 


\/ f iff xi © rc 3 = 


1 


(101) 


V iff x 3 = 1 




(001). 



The strings on the right encode the condition for the operation V or at each step— 
the "l"'s indicate which input bits are involved in the condition. For an efficient 
implementation of A 3 (C/), these strings form a grey code sequence. Note also that the 
parity of each bit string determines whether to apply V or VL By comparing this 
sequence of operations with the terms in the equation 

Xi + x 2 + x 3 - (x l © x 2 ) - (xi © x 3 ) - (x 2 © x 3 ) + (x l © x 2 © x 3 ) = 4 • (x l A x 2 A x 3 ), 

it can be verified that the above sequence of operations is equivalent to performing 
V 4 on the fourth bit iff x± A x 2 A x 3 — 1, which is the A 3 (V 4 ) gate. 

The foregoing can be generalized to simulate A m (U) for larger values of m. 

Lemma 7.1: For any n > 3 and any unitary 2x2 matrix U, a A n _i(C/) gate can 
be simulated by an n-bit network consisting of2 n ~ 1 — 1 Ai(V) and Ai(l /t ) gates and 
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2™ 1 — 2 Ai(cr x ) gates, where V is unitary. 

We omit the proof of Lemma 7.1, but point out that it is a generalization of the 
n = 4 case above and based on setting V so that V 2 ™ 2 = U and "implementing" the 
identity 

fel kl<k2 fcl<fe2<fe3 

= 2™" 1 • (a;! A x 2 A • • • A x m ) 

with a grey-code sequence of operations. 

For some specific small values of n (for n — 3, 4, 5, 6, 7, and 8), this is the most 
efficient technique that we are aware of for simulating arbitrary A n _i(C/) gates as 
well as A n -i(a x ) gates; taking account of mergers (see Corollary 6.2), the simulation 
requires 3-2 n_1 — 4 Ai(cr x )'s and 2-2 n_1 A 's. However, since this number is 0(2 n ), the 
simulation is very inefficient for large values of n. For the remainder of this section, 
we focus on the asymptotic growth rate of the simulations with respect to n, and show 
that this can be quadratic in the general case and linear in many cases of interest. 

7.1 Linear Simulation of A n _2(cx) Gates on n-Bit Networks 

Lemma 7.2: If n > 5 and m e {3, . . . , |~|]} then a A m (a x ) gate can he simulated 
by a network consisting of 4(m — 2) A 2 (cr x ) gates that is of the form 
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(illustrated for n = 9 and m = 5). 

Proof: Consider the group of the first 7 gates in the above network. The sixth bit 
(from the top) is negated iff the first two bits are 1, the seventh bit is negated iff the 
first three bits are 1, the eighth bit is negated iff the first four bits are 1, and the 
ninth bit is negated iff the first five bits are 1. Thus, the last bit is correctly set, but 
the three preceding bits are altered. The last 5 gates in the network reset the values 
of these three preceding bits.D 

Note that in this construction and in the ones following, although many of the bits 
not involved in the gate are operated upon, the gate operation is performed correctly 
independent of the initial state of the bits (i.e., they do not have to be "cleared" to 
first), and they are reset to their initial values after the operations of the gate (as 



in the computations which occur in [41] and [4C]). This fact makes constructions like 
the following possible. 

Lemma 7.3: For any n > 5, and m G {2, . . . , n — 3} a A n -2(^x) gate can be simulated 
by a network consisting of two A m (a x ) gates and two A n - m -.i(a x ) gates which is of 
the form 
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(illustrated for n = 9 and m = 5). 
Proof: By inspection. □ 

Corollary 7.4: On an n-bit network (where n > 7), a /\ n ~2{&x) gate can be simulated 
by 8(n — 5) A 2 (o- x ) gates (3-bit To Soli gates), as well as by 48n — 204 basic operations. 

Proof: First apply Lemma 7.2 with m 1 = |"|] and m 2 = n — mi — 1 to simulate 
A mi (<r x ) and A m2 ((T :E ) gates. Then combine these by Lemma 7.3 to simulate the 
A n -2(o"x) gate. Then, each A2(o". E ) gate in the above simulation may be simulated by 
a set of basic operations (as in Corollary 6.2). We find that almost all of these Toffoli 
gates need only to be simulated modulo phase factors as in Sec. 6.2; in particular, 
only 4 of the Toffoli gates, the ones which involve the last bit in the diagram above, 
need to be simulated exactly according to the construction of Corollary 6.2. Thus 
these 4 gates are simulated by 16 basic operations, while the other 8n — 36 Toffoli 
gates are simulated in just 6 basic operations. A careful accounting of the mergers 
of A gates which are then possible leads to the total count of basic operations given 
above. □ 

The above constructions, though asymptotically efficient, requires at least one "extra" 
bit, in that an n-bit network is required to simulate the (n — l)-bit gate A n _ 2 (a x ). In 
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the next subsection, we shall show how to construct A n _i(C/) for an arbitrary unitary 
U using a quadratic number of basic operations on an n-bit network, which includes 
the n-bit Toffoli gate A n _i(cr x ) as a special case. 

7.2 Quadratic Simulation of General A n -i(U) Gates on n-Bit 
Networks 

Lemma 7.5: For any unitary 2x2 matrix U, a A n _i(C/) gate can he simulated by 
a network of the form 
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(illustrated for n = 9), where V is unitary 

Proof: The proof is very similar to that of Lemma 6.1, setting V so that V 2 = U.U 

Corollary 7.6: For any unitary U, a A n _i(C/) gate can he simulated in terms of 
6(n 2 ) basic operations. 

Proof: This is a recursive application of Lemma 7.5. Let C n _i denote the cost 
of simulating a A„_i(C/) (for an arbitrary U). Consider the simulation in Lemma 
7.5. The cost of simulating the Ai(V) and /\\(V^) gates is 0(1) (by Corollary 5.3). 
The cost of simulating the two A n _ 2 (°":r) gates is 0(n) (by Corollary 7.4). The cost 
of simulating the /\ n -2(y) gate (by a recursive application of Lemma 7.5) is C n _2- 
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Therefore, C n _i satisfies a recurrence of the form 

C n _i = C n _ 2 + e(n), 
which implies that C n _i G 6(n 2 ).D 

In fact, we find that using the gate-counting mentioned in Corollary 7.4, the number 
of basic operations is 48n 2 + 0(n). 

Although Corollary 7.6 is significant in that it permits any A n _i(C/) to be simulated 
with "polynomial complexity", the question remains as to whether a subquadratic 
simulation is possible. The following is an Q(n) lower bound on this complexity. 

Lemma 7.7: Any simulation of a nonscalar A n _i(C/) gate (i.e. where U ^ Ph(S) ■ I) 
requires at least n — 1 basic operations. 

Proof: Consider any ra-bit network with arbitrarily many 1-bit gates and fewer than 
n — 1 Ai(cr x ) gates. Call two bits adjacent if there there is a Ai(cr x ) gate between 
them, and connected if there is a sequence of consecutively adjacent bits between 
them. Since there are fewer than n — 1 Ai(cr x ) gates, it must be possible to partition 
the bits into two nonempty sets A and B such that no bit in A is connected to any 
bit in B. This implies that the unitary transformation associated with the network is 
of the form A® B, where A is 2'' 4 '-dimensional and B is 2' e '-dimensional. Since the 
transformation A n _i(C/) is not of this form, the network cannot compute A„_i (U).D 

It is conceivable that a linear size simulation of A n _i(C/) gates is possible. Al- 
though we cannot show this presently, in the remaining subsections, we show that 
something "similar" (in a number of different senses) to a linear size simulation of 
Ki-i(U) gates is possible. 

7.3 Linear Approximate Simulation of General A n -i(U) Gates 
on n-Bit Networks 

Definition: We say that one network approximates another one within e if the 
distance (induced by the Euclidean vector norm) between the unitary transformations 
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associated with the two networks is at most e. 

This notion of approximation in the context of reducing the complexity of quantum 
computations was introduced by Coppersmith ||33||, and is useful for the following 
reason. Suppose that two networks that are approximately the same (in the above 
sense) are executed with identical inputs and their outputs are observed. Then the 
probability distributions of the two outcomes will be approximately the same in the 
sense that, for any event, its probability will differ by at most 2e between the two 
networks. 

Lemma 7.8: For any unitary 2x2 matrix U and e > 0, a A n ,_i(Z7) gate can be 
approximated within e by 0(nlog(^)) basic operations. 

Proof: The idea is to apply Lemma 7.5 recursively as in Corollary 7.6, but to observe 
that, with suitable choices for V, the recurrence can be terminated after 0(log(~)) 
levels. 

Since U is unitary, there exist unitary matrices P and D, such that U = P^ ■ D ■ P 
and 



where d\ and d 2 are real, e 1 and e 2 are the eigenvalues of U. If V k is the matrix used 
in the k th recursive application of Lemma 7.3 (k e {0, 1,2,.. .}) then it is sufficient 
that V fc 2 + i = V k for each k e {0, 1,2,.. .}. Thus, it suffices to set V k = P ] ■ D k ■ P, 




where 






e id 2 /2 k 



for each k G {0, 1, 2, . . .}. Note that then 



\\v k -ih 



= \\P^D k -P-I\\ 2 

= \\pi-{D k -I)-P\\ 2 

< ||pt|| 2 .||D fc -/|| 2 .||P|| 2 



D k -I\\ 2 
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< 7T/2 fc . 



Therefore, if the recursion is terminated after k = |~log 2 (f )] steps then the discrepancy 
between what the resulting network computes and A n -i(U) is an (n — fc)-bit trans- 
formation of the form A n „ fc _i(V fc ). Since || A n _ fe _i (V k ) - A„_ fc _i(/)|| 2 = \\V k - I\\ 2 < 
7r / 2 r i °sa ( ^ )1 < £) the network approximates A n -i(U) within £.□ 



7.4 Linear Simulation in Special Cases 

Lemma 7.9: For any SU(2) matrix W, a A n -i(W) gate can he simulated by a 
network of the form 
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where A, B, and C G SU(2). 

Proof: The proof is very similar to that of Lemma 5.1, referring to Lemma 4.3. □ 

Combining Lemma 7.9 with Corollary 7.4, we obtain the following. 

Corollary 7.10: For any W G 577(2), a A„_ 2 (M / ) gate can he simulated by 6(n) 
basic operations. 

As in Section 5, a noteworthy example is when 
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In this case, we obtain a linear simulation of a transformation congruent modulo 
phase shifts to the n-bit Toffoli gate A„_i(cr x ). 

7.5 Linear Simulation of General A n -2(U) Gates on n-Bit Net- 
works With One Bit Fixed 

Lemma 7.11: For any unitary U, a A n _ 2 (£/) gate can be simulated by an n-bit 
network of the form 
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(illustrated for n = 9), where the initial value of one bit (the second to last) is fixed 
at (and it incurs no net change). 

Proof: By inspection. □ 

Combining Lemma 7.11 with Corollary 7.4, we obtain the following. 

Corollary 7.12: For any unitary U, a A n - 2 (U) gate can be simulated by Q(n) basic 
operations in n-bit network, where the initial value of one bit is fixed and incurs no 
net change. 

Note that the "extra" bit above may be reused in the course of several simulations of 
A m (U) gates. 
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8 Efficient general gate constructions 

In this final discussion we will change the ground rules slightly by considering the 
"basic operation" to be any two-bit operation. This may or may not be a physically 
reasonable choice in various particular implementations of quantum computing, but 
for the moment this should be considered as just a mathematical convenience which 
will permit us to address somewhat more general questions than the ones considered 
above. When the arbitrary two-bit gate is taken as the basic operation, then as 
we have seen, 5 operations suffice to produce the Toffoli gate (recall Lemma 6.1), 3 
produce the Toffoli gate modulo phases (we permit a merging of the operations in the 
construction of Sec. 6.2), and 13 can be used to produce the 4-bit Toffoli gate (see 
Lemma 7.1). In no case do we have a proof that this is the most economical method 
for producing each of these functions; however, for most of these examples we have 
compelling evidence from numerical study that these are in fact minimal [Q. 

In the course of doing these numerical investigations we discovered a number of 
interesting additional facts about two-bit gate constructions. It is natural to ask, how 
many two-bit gates are required to perform any arbitrary three-bit unitary operation, 
if the two-bit gates were permitted to implement any member of U(4)? The answer 
is six, as in the gate arrangement shown here. 
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46 




64 


16 






37 






55 











We find an interesting regularity in how the U(8) operation is built up by this se- 
quence of gates, which is summarized by the "dimensionalities" shown in the diagram. 
The first U(4) operation has 4 2 = 16 free angle parameters; this is the dimensionality 
of the space accessible with a single gate, as indicated. With the second gate, this 
dimensionality increases only by 12, to 28. It does not double to 32, for two reasons. 
First, there is a single global phase shared by the two gates. Second, there is a set 
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of operations acting only on the bit shared by the two gates, which accounts for the 
additional reduction of 3. Formally, this is summarized by noting that 12 is the di- 
mension of the coset space SU(4)/SU(2). The action of the third gate increases the 
dimensionality by another 9 = 16 — 1 — 3 — 3. 9 is the dimension of the coset space 
SU(4)/SU(2)xSU(2). The further subtraction by 3 results from the duplication of 
one-bit operations on both bits of the added gate. At this point the dimensionality 
increases by nine for each succeeding gate, until the dimensionality reaches exactly 64, 
the dimension of U(8), at the sixth gate. In preliminary tests on four-bit operations, 
we found that the same rules for the increase of dimensionality applied. This permits 
us to make a conjecture, just based on dimension counting, of a lower bound on the 
number of two-bit gates required to produce an arbitrary n-bit unitary transforma- 
tion: Q(n) = §4 n — \n— |. It is clear that "almost all" unitary transformations will be 
computationally uninteresting, since they will require exponentially many operations 
to implement. 

Finally, we mention that by combining the quantum gate constructions introduced 



here with the decomposition formulas for unitary matrices as used by Reck aZ.[[L5], an 
explicit, exact simulation of any unitary operator on n bits can be constructed using 
a finite number (0(n 3 4 n )) of two-bit gates, and using no work bits. In outline, the 
procedure is as follows: Reck et al. [[TjJ note that a formula exists for the decomposition 
of any unitary matrix into matrices only involving a U(2) operation acting in the space 
of pairs of states (not bits): 

U=( [] T{xl, x2)) ■ D. 

xl,a;2g{0,l} m , xl>x2 

T(xl, x2) performs a U(2) rotation involving the two basis states xl and x2, and leaves 
all other states unchanged; D is a diagonal matrix involving only phase factors, and 
thus can also be thought of as a product of 2 n_1 matrices which perform rotations in 
two-dimensional subspaces. Using the methods introduced above, each T(xl,x2) can 
be simulated in polynomial time, as follows: write a grey code connecting xl and x2; 
for example, if n = 8, xl = 00111010, and x2 = 00100111: 
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1 00111010 xl 

2 00111011 

3 00111111 

4 00110111 

5 00100111 x2 

Operations involving adjacent steps in this grey code require a simple modification of 
the A n _i gates introduced earlier. The (n — 1) control bits which remain unchanged 
are not all 1 as in our earlier constructions, but they can be made so temporarily by 
the appropriate use of "NOT" gates (Ao(cr x )) before and after the application of the 
A n _i operation. Now, the desired T(xl,x2) operation is constructed as follows: first, 
permute states down through the grey code, performing the permutations (1,2), (2,3), 
(3,4), ... (to-2,to-1). These numbers refer to the grey code elements as in the table 
above, where m, the number of elements in the grey code, is 5 in the example. Each of 
these permutations is accomplished by a modified A n -i(a x ). Second, the desired U(2) 
rotation is performed by applying a modified A„_i(£7) involving the states (to — 1) and 
(to). Third, the permutations are undone in reverse order: (to-2,to-1), (to-3,to-2), ... 
(2,3), (1,2). 

The number of basic operations to perform all these steps may be easily estimated. 
Each T(xl,x2) involves 2to — 3 (modified) A n _i gates, each of which can be done in 
0(n 2 ) operations. Since to, the number of elements in the grey code sequence, cannot 
exceed n + 1, the number of operations to simulate T(xl,x2) is 0(n 3 ). There are 
0(4 n ) T's in the product above, so the total number of basic operations to simulate 
any U(2 n ) matrix exactly is 0(n 3 4 n ). (The number of steps to simulate the D matrix 
is smaller and does not affect the count.) So, we see that this strict upper bound 
differs only by a polynomial factor (which likely can be made better than n 3 ) from the 
expected lower bound quoted earlier, so this Reck procedure is relatively "efficient" 
(if something which scales exponentially may be termed so). A serious problem with 
this procedure is that it is extremely unlikely, so far as we can tell, to provide a 
polynomial-time simulation of those special U(2 n ) which permit it, which of course 
are exactly the ones which are of most interest in quantum computation. It still 
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remains to find a truly efficient and useful design methodology for quantum gate 
construction. 
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